2025-11-22

What is Least Angle Regression?

  • A regression algorithm for high-dimensional data
  • Builds the model incrementally like forward selection
  • But less greedy and more statistically efficient
  • Computes the entire LASSO path with a small modification
  • Produces smooth, piecewise-linear coefficient trajectories

Why Do We Need LARS?

Forward Selection Problems:

  • Commits too strongly to the first chosen variable
  • Struggles with correlated predictors
  • Greedy -> unstable -> can miss important variables

LARS Fixes This:

  • Takes controlled, geometric steps
  • Adjusts direction whenever correlations change
  • Never “overcommits” too early
  • Fair to correlated predictors

Key Idea

LARS moves in the direction that forms equal angles with all predictors most correlated with the residual.

  • A balanced “least angle” update
  • A clear sequence of variable entry
  • A smooth coefficient path

Setup and Notation

We model:

\[ \mu = X\beta \]

At any step:

  • Residual:
    \[ r = y - X\beta \]

  • Correlations with residual: \[ c_j = x_j^\top r \]

  • Active set: \[ A = \{ j : |c_j| = \max_k |c_k| \} \]

The Equiangular Direction

To update the model, LARS finds a vector \(u_A\) such that:

  • Every active predictor makes the same angle with \(u_A\)
  • Each active predictor has the same correlation with the update

Mathematically:

\[ u_A = X_A w_A, \quad w_A = \frac{G_A^{-1} 1_A}{\sqrt{1_A^\top G_A^{-1}1_A}} \]

  • \(X_A\) is the matrix of active predictors
  • \(G_A = X_A^\top X_A\)
  • \(1_A\) is a vector of ones

This ensures a balanced movement among active predictors.

Updating the Model

The fitted values update via:

\[ \mu \leftarrow \mu + \gamma\, u_A \]

Where the step size \(\gamma\) is the largest value such that:

  • All active predictors remain tied
  • A new predictor reaches the same correlation level

Thus the active set expands exactly when it should.

Geometric Picture

Intuition: What LARS Does

  1. Start with all coefficients = 0
  2. Find the predictor most correlated with the response
  3. Move toward it until another predictor becomes equally correlated
  4. Turn and move along a direction splitting the angle between them
  5. Repeat until all predictors enter

This movement is piecewise-linear and geometrically optimal.

Connection to LASSO

LARS becomes the LASSO algorithm by adding:

If a coefficient would cross zero, remove that variable from the active set.

This enforces the L1 constraint.

LARS + zero-crossing rule = Exact LASSO path

When LARS is Appropriate

  • High-dimensional data with many predictors: Excellent for datasets where the number of predictors exceeds the number of observations (\(p > n\)).

  • Variable selection is a priority: Provides a clear and complete solution path showing the sequence and importance of features entering the model.

  • Computational efficiency matters: Computes the full regularization path with similar speed to a single least squares fit, \(O\)(\(k^3+pk^2\)).

  • Collinearity among predictors: Effectively manages correlated predictors by moving them together, helping to identify meaningful feature groups.

Real-World Examples: LARS is Appropriate

  • Genomics: Gene expression analysis with thousands of genes but few samples (\(p >> n\)).

  • Financial modeling: Portfolio optimization with hundreds of assets and economic indicators.

  • Medical imaging: Disease prediction from high-dimensional MRI/CT features with limited patient data.

  • Marketing analytics: Customer behavior modeling with many potential predictors (demographics, interactions, preferences).

When LARS is NOT Appropriate

  • Non-linear relationships: Assumes linearity; unsuitable for data with non-linear patterns between features and the response.

  • Categorical variables dominate: Best suited for continuous predictors; extensive categorical data may not fully leverage LARS’s benefits.

  • Outliers are present: Highly susceptible to outliers, which can significantly distort the regression path (consider alternatives).

  • Temporal Dependencies: Fails to capture inherent trends, seasonality, or autocorrelation present in time-series data.

  • Very large sample size, few features: Standard linear regression is often simpler and equally effective when observations heavily outweigh the number of features.

Real-World Examples: LARS is NOT Appropriate

  • Image recognition: Deep non-linear relationships between pixels and object classes.

  • Stock price prediction: Time-series with autocorrelation, trends, and temporal dependencies.

  • Survey data with outliers: Data quality issues where extreme responses distort the model.

  • Simple A/B testing: Few variables with large sample sizes where standard regression suffices.

Main Dataset

This project uses the Burke et al. (2022) global urban soil black carbon dataset, obtained from the Knowledge Network for Biocomplexity (KNB) at: https://knb.ecoinformatics.org/view/urn:uuid:1651eeb1-e050-4c78-8410-ec2389ca2363

The dataset pulls together measurements of black carbon in urban soils from cities around the world. Each row includes details like latitude/longitude, elevation, precipitation, soil temperature at different depths, land-cover type, population info, and notes from the original studies. The main sheet (“Urban Black Carbon”) contains 600+ observations and about 65 variables, giving us a wide mix of environmental and geographic predictors.

Because many of these variables move together (climate, location, soil traits, etc.), the dataset naturally has clusters of correlated features, which makes it a solid fit for demonstrating Least Angle Regression (LARS).

Data Dictionary

We removed variables with 90+% missing values to avoid unstable predictors and ensure consistent sample size across all variables. This threshold preserved essential environmental predictors while excluding sparse fields that contained too little information to contribute to modeling.

BC vs Depth

## `geom_smooth()` using formula = 'y ~ x'

Takeaways

  • High BC values occur almost entirely near the surface (0–1 cm), reflecting strong urban deposition.

  • BC drops sharply with depth, flattening to low levels beyond ~4 cm.

  • Variance is very large at shallow depths, but nearly zero deeper in the profile.

  • Pattern indicates a nonlinear depth–BC relationship and potential heteroskedasticity.

  • Supports LARS: depth correlates with other environmental factors, and regularization helps stabilize selection in the presence of such gradients.

Predictor Correlation Heatmap

Takeaways

  • Strong spatial correlation between latitude and longitude.

  • The two soil temperature variables are highly correlated.

  • Elevation and precipitation follow the same environmental gradient.

  • Overall: clear multicollinearity clusters, meaning several predictors share similar information. This supports using LARS.